Search CORE

80 research outputs found

Qualitative analysis of post-editing for high quality machine translation

Author: Blain Frédéric
Plitt Mirko
Roturier Johann
Schwenk Holger
Senellart Jean
Publication venue: Asia-Pacific Association for Machine Translation
Publication date: 25/08/2020
Field of study

In the context of massive adoption of Machine Translation (MT) by human localization services in Post-Editing (PE) workflows, we analyze the activity of post-editing high quality translations through a novel PE analysis methodology. We define and introduce a new unit for evaluating post-editing effort based on Post-Editing Action (PEA) - for which we provide human evaluation guidelines and propose a process to automatically evaluate these PEAs. We applied this methodology on data sets from two technologically different MT systems. In that context, we could show that more than 35% of the remaining effort can be saved by introducing of global PEA and edit propagation

Wolverhampton Intellectual Repository and E-theses

Bilexical embeddings for quality estimation

Author: Blain Frédéric
Scarton Carolina
Specia Lucia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 25/08/2020
Field of study

© 2017 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: http://dx.doi.org/10.18653/v1/W17-4760This work was supported by the QT21 project (H2020 No. 645452)

Wolverhampton Intellectual Repository and E-theses

Multimodal quality estimation for machine translation

Author: Blain Frédéric
Okabe Shu
Specia Lucia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

© 2020 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: http://dx.doi.org/10.18653/v1/2020.acl-main.114We propose approaches to Quality Estimation (QE) for Machine Translation that explore both text and visual modalities for Multimodal QE. We compare various multimodality integration and fusion strategies. For both sentence-level and document-level predictions, we show that state-of-the-art neural and feature-based QE frameworks obtain better results when using the additional modality.This work was supported by funding from both the Bergamot project (EU H2020 Grant No. 825303) and the MultiMT project (EU H2020 ERC Starting Grant No. 678017)

Crossref

Wolverhampton Intellectual Repository and E-theses

Sheffield systems for the English-Romanian translation task

Author: Blain Frédéric
Song Xingyi
Specia Lucia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 25/08/2020
Field of study

Wolverhampton Intellectual Repository and E-theses

Quality in, quality out: learning from actual mistakes

Author: Aletras Nikos
Blain Frédéric
Specia Lucia
Publication venue: European Association for Machine Translation
Publication date: 25/08/2020
Field of study

© 2020 The Authors. Published by European Association for Machine Translation. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://www.aclweb.org/anthology/2020.eamt-1.16/Approaches to Quality Estimation (QE) of machine translation have shown promising results at predicting quality scores for translated sentences. However, QE models are often trained on noisy approximations of quality annotations derived from the proportion of post-edited words in translated sentences instead of direct human annotations of translation errors. The latter is a more reliable ground-truth but more expensive to obtain. In this paper, we present the first attempt to model the task of predicting the proportion of actual translation errors in a sentence while minimising the need for direct human annotation. For that purpose, we use transfer-learning to leverage large scale noisy annotations and small sets of high-fidelity human annotated translation errors to train QE models. Experiments on four language pairs and translations obtained by statistical and neural models show consistent gains over strong baselines.This work was supported by the Bergamot project (EU H2020 Grant No. 825303)

Wolverhampton Intellectual Repository and E-theses

USFD’s phrase-level quality estimation systems

Author: Blain Frédéric
Logacheva Varvara
Specia Lucia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 25/08/2020
Field of study

© 2016 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: http://dx.doi.org/10.18653/v1/W16-2386Logacheva, V., Blain, F. and Specia, L. (2016) USFD’s phrase-level quality estimation systems. In, Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, Bojar, O., Buck, C., Chatterjee, R., Federmann, C. et al. (eds.) Stroudsburg, PA: Association for Computational Linguistics, pp. 800-805.This work was supported by the EXPERT (EU FP7 Marie Curie ITN No. 317471, Varvara Logacheva) and the QT21 (H2020 No. 645452, Lucia Specia, Fred´ eric Blain) projects

Wolverhampton Intellectual Repository and E-theses

Phrase level segmentation and labelling of machine translation errors

Author: Blain Frédéric
Logacheva Varvara
Specia Lucia
Publication venue: European Language Resources Association (ELRA)
Publication date: 25/08/2020
Field of study

© 2016 The Authors. Published by European Language Resources Association (ELRA). This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://www.aclweb.org/anthology/L16-1356/This paper presents our work towards a novel approach for Quality Estimation (QE) of machine translation based on sequences of adjacent words, the so-called phrases. This new level of QE aims to provide a natural balance between QE at word and sentence-level, which are either too fine grained or too coarse levels for some applications. However, phrase-level QE implies an intrinsic challenge: how to segment a machine translation into sequence of words (contiguous or not) that represent an error. We discuss three possible segmentation strategies to automatically extract erroneous phrases. We evaluate these strategies against annotations at phrase-level produced by humans, using a new dataset collected for this purpose.The authors would like to thanks all the annotators who helped to create the first version of gold-standard annotations at phrase-level. This work was supported by the QT21 (H2020 No. 645452, Lucia Specia, Fred´ eric Blain) and EX-PERT (EU FP7 Marie Curie ITN No. 317471, Varvara Logacheva) projects

Wolverhampton Intellectual Repository and E-theses

Sheffield submissions for the WMT18 quality estimation shared task

Author: Blain Frédéric
Ive Julia
Scarton Carolina
Specia Lucia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 25/08/2020
Field of study

© 2018 The Authors. Published by Association for Computational Linguistics. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: http://dx.doi.org/10.18653/v1/W18-6463In this paper we present the University of Sheffield submissions for the WMT18 Quality Estimation shared task. We discuss our submissions to all four sub-tasks, where ours is the only team to participate in all language pairs and variations (37 combinations). Our systems show competitive results and outperform the baseline in nearly all cases.Carolina Scarton is supported by the EC project SIMPATICO (H2020-EURO-6-2015, grant number 692819). Frederic Blain is supported by the Amazon Academic Research Awards program

Wolverhampton Intellectual Repository and E-theses

Tailoring Domain Adaptation for Machine Translation Quality Estimation

Author: Blain Frédéric
De Sisto Mirella
Emmery Chris
Sharami Javad Pourmostafa Roshan
Shterionov Dimitar
Spronck Pieter
Vanmassenhove Eva
Publication venue
Publication date: 18/04/2023
Field of study

While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high-cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizable, i.e., they should be able to handle data from different domains, both generic and specific. To alleviate these two main issues -- data scarcity and domain mismatch -- this paper combines domain adaptation and data augmentation within a robust QE system. Our method is to first train a generic QE model and then fine-tune it on a specific domain while retaining generic knowledge. Our results show a significant improvement for all the language pairs investigated, better cross-lingual inference, and a superior performance in zero-shot learning scenarios as compared to state-of-the-art baselines.Comment: Accepted to EAMT 2023 (main

arXiv.org e-Print Archive

Tailoring Domain Adaptation for Machine Translation Quality Estimation

Author: Blain Frédéric
Emmery Chris
Sharami Javad Pourmostafa Roshan
Shterionov Dimitar
Sisto Mirella De
Spronck Pieter
Vanmassenhove Eva
Publication venue
Publication date: 18/04/2023
Field of study

While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizable, i.e., they should be able to handle data from different domains, both generic and specific. To alleviate these two main issues -- data scarcity and domain mismatch -- this paper combines domain adaptation and data augmentation within a robust QE system. Our method first trains a generic QE model and then fine-tunes it on a specific domain while retaining generic knowledge. Our results show a significant improvement for all the language pairs investigated, better cross-lingual inference, and a superior performance in zero-shot learning scenarios as compared to state-of-the-art baselines

Tilburg University Repository